<?php
/**
 * <https://y.st./>
 * Copyright © 2018 Alex Yst <mailto:copyright@y.st>
 * 
 * This program is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 * 
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 * GNU General Public License for more details.
 * 
 * You should have received a copy of the GNU General Public License
 * along with this program. If not, see <https://www.gnu.org./licenses/>.
**/

$xhtml = array(
	'<{title}>' => 'Learning Journal',
	'<{subtitle}>' => 'MATH 1280: Introduction to Statistics',
	'<{copyright year}>' => '2018',
	'takedown' => '2017-11-01',
	'<{body}>' => <<<END
<section id="Unit1">
	<h2>Unit 1</h2>
	<h3>Daily reflection</h3>
	<h4>2018-09-05</h4>
	<p>
		The first day of a term is always a mad scramble for me.
		I make a habit of making at least one discussion post each day.
		(This post may be in either course; I don&apos;t necessarily make a post in <strong>*each*</strong> course each day, just one post between the two.)
		We&apos;re not allowed to see the course material until 00:05 in the university&apos;s time zone, which is 22:05 in my own time zone.
		That leaves me with a little less than two hours to read enough of the material in one of my new courses (of which I may know little to nothing of the subject matter) to cobble together a coherent and well-thought-out post.
	</p>
	<p>
		Some terms, I catch a break.
		This was one such term.
		The discussion assignment for the week here was to discuss our experience in installing R.
		Instead of reading up to all of the course material for the week, all I would have to do was install a little software.
		I could accomplish that in two hours.
		No sweat.
	</p>
	<p>
		Due to being in a mad rush, I put in a bit more effort in getting the thing installed than was really necessary.
		It did get installed though, and seems to function.
	</p>
	<h4>2018-09-06</h4>
	<p>
		Like I said yesterday, I aim for one post per day.
		Having made my initial post in this course yesterday, I spent today reading reading the assigned material for my other course so I could write up my initial post there.
		I did take the time to look over the reading list for this course.
		One chapter.
		I&apos;ll probably get started on that tomorrow, though I also have an errand and a work shift tomorrow, so it&apos;s unlikely I&apos;ll finish by end of day.
	</p>
	<p>
		I tried to download the textbook using the link in the instructions for the week, but the page at that address says there&apos;s a missing record in the database.
		Someone deleted the textbook and failed to update that link or something.
		A different copy of the textbook is linked to from the course index though, so I did get a copy onto my machine in preparation for tomorrow.
	</p>
	<h4>2018-09-07</h4>
	<p>
		I ended up making a few posts in the discussion forum with what little time I had today instead of getting started on the reading assignment.
		Maybe tomorrow then.
	</p>
	<h4>2018-09-08</h4>
	<p>
		Today, I got next to nothing done.
		I was overwhelmed with all the things I needed to accomplish, both in and out of school, and fizzled out.
		There&apos;s no excuse for that.
	</p>
	<p>
		Part of the problem is that I&apos;m focussing on too many things.
		I had that problem at the beginning of this last break too, so I put together a list of all my goals, how important they are, how urgent they are, and which ones can&apos;t be completed until after other goals have been taken care of.
		It really helped a lot.
		I decided on a few to focus on and put the rest on hold.
		But school wasn&apos;t in session then.
		I had more time.
		Now, I need to put another goal on the back burner for the time being so I have the time and energy to focus on coursework.
	</p>
	<h4>2018-09-09</h4>
	<p>
		Having set aside one of my other goals for now so I can regain focus on coursework, I had the time to read much of the assigned reading material today.
		About half the material was dedicated to justify the study of statistics.
		I&apos;m a programmer though.
		I use statistics all the time.
		You don&apos;t need to justify their usefulness to me.
		I&apos;m reminded of a mug I saw in a second-hand shop once.
		It said something to the effect of &quot;another day has gone by and I still haven&apos;t needed to use algebra&quot;.
		I saw it and thought to myself: &quot;I used algebra again earlier today. I frequently use algebra for <strong>*fun*</strong>!&quot;.
		The book is quick to point out too that you don&apos;t need to perform calculations in your head.
		You can use a computer or calculator.
		Again, this fits well into what I already do as a programmer.
		As a programmer, any calculation carried out by hand, if not done during the planning phases, is completely useless.
		By combining formulae (statistical or otherwise) and algebra, then making the computer solve for us, we get much more flexible and dynamic results.
		Understanding the logic behind a given formula is vital, but preforming the calculation by hand, at least in the field of computer science, is counterproductive.
	</p>
	<h4>2018-09-10</h4>
	<p>
		I finished the reading material and completed the learning journal exercises (below) today.
		The book says that if we experiment with R, the worst that can happen is that the computer could crash or freeze.
		I&apos;m ... hoping that the author doesn&apos;t know what they&apos;re talking about.
		If running R commands can seriously take down the whole computer, that means it&apos;s incredibly poorly programmed and these problems should be fixed.
		Normally, a computer sandboxes all programs, each in its own box, so they don&apos;t accidentally trample on one another.
		It takes something pretty big to break out of one of those boxes and adversely affect the system.
		It is possible.
		I&apos;ve written bugs into my own code occasionally that have frozen my computer; usually I do it by accidentally eating up all the available memory.
		But these are things that should be long fixed in the stable versions of widely-used programs such as R.
		This isn&apos;t acceptable behaviour for released code to have.
		Only highly-experimental, unreleased code can be given a pass on that.
	</p>
	<p>
		The assignment operator in R is a bit strange.
		Most languages use an equals sign.
		In some contexts, some languages use a right-facing $a[ASCII] arrow.
		R instead uses a left-facing $a[ASCII] arrow.
	</p>
	<p>
		Frustratingly, the book doesn&apos;t tell how to save R sessions when running from a bare console.
		It only claims that one can save by attempting to close the console window.
		This maw work on console windows that aren&apos;t actually console windows, but on actual console windows, this doesn&apos;t work.
		Closing a console window simply prematurely terminates the program.
		I doubt we&apos;ll need to save sessions in this course, but if we do, I&apos;ll look up how to do it properly.
	</p>
	<p>
		Unintuitively, it seems R allows full stops in variable names.
		I&apos;ve never seen another programming language that allows that.
	</p>
	<h4>2018-09-11</h4>
	<p>
		I didn&apos;t get much done today due to having a guest over, but I did make sure to get a couple discussion posts submitted.
	</p>
	<h4>2018-09-12</h4>
	<p>
		I got my remaining discussion posts submitted and took the quiz for the week.
		Unfortunately, my time today was minimal, so I had to rush.
		I ended up scoring less on the quiz than I would like to have.
		If I manage time better next week, maybe I can do better.
	</p>
	<h3>The <code>table()</code> command</h3>
	<p>
		When using the <code>table()</code> command, the first row of numbers represents the data points in the set.
		For example, if there&apos;s a five in the data set, a five will be displayed as one of the top numbers.
		However, if multiple fives are present in the data, that&apos;s still only one thing the dataset contains, so there&apos;s only one five in the top row.
		The bottom row tells how many <strong>*instances*</strong> of the specific data exists in the set.
		For example, if there are three fives, there&apos;ll e a five in the top row and a three in the bottom row just beneath it.
	</p>
	<h3>Parameters versus statistics</h3>
	<p>
		By the definitions used for this course, a parameter is a piece of data about a specific population.
		It&apos;s not calculated based on data from a sample, but is instead the actual, factual data sought.
		In many cases, getting the exact value of a parameter is either not possible or not feasible, so parameters are guessed at based on statistics of a sample, but these guesses won&apos;t be perfect.
		On the other hand, in this course, we&apos;re using the definition that a statistic is the data calculated from a sample.
		Statistics are useful for making educated guesses about parameters, but aren&apos;t themselves the data which we seek.
	</p>
	<p>
		Given the definitions used in this course, the teacher mentioned in the problem that wants to get the average of their students&apos; test scores could use either a parameter or a statistic.
		To use a statistic, they&apos;d take a sample of the test scores, average them, then assume the statistic found was probably close to the parameter they were actually seeking.
		But what&apos;s the point of doing that?
		Statistics are used when obtaining the real parameter isn&apos;t possible or isn&apos;t feasible.
		The described teacher wants to know about the average test score not of the school or the world, but of their one class of thirty students.
		This makes their population the test scores of these specific students, which conveniently, being the grader of these students&apos; tests, is specifically the grades the teacher has the most access to.
		Likely, the grades have been entered into a computer program that can calculate the average of the population, which would be the actual parameter, in milliseconds.
		If not, averaging thirty numbers by hand using a pen and paper isn&apos;t that hard.
		Instead of using a statistic drawn from a sample of the population, it makes much more sense to just computer the parameter itself from the population itself.
	</p>
	<p>
		The company mentioned would have a difficult time finding the actual parameter describing how many people in the country know their name.
		It may or may not be possible, but it certainly wouldn&apos;t be feasible or an economically-viable option.
		They&apos;re much better off conducting a small survey to get a statistic, then making inferences about the parameter from that statistic.
		They can&apos;t get the exact answer from the statistic, but they will likely get pretty close.
	</p>
</section>
<section id="Unit2">
	<h2>Unit 2</h2>
	<h3>Daily reflection</h3>
	<h4>2018-09-13</h4>
	<p>
		I began reading the material for the week.
		I found it took me a bit to figure out what the book was doing in its examples.
		It didn&apos;t explain the functions it used very well, but I think I finally understand what the functions actually do.
	</p>
	<p>
		The book also tells how to change the working directory, but only via a graphical front end to R.
		It doesn&apos;t tell the command needed to do so if working with R directly.
		Someone in the discussion forum mentioned how to change the working directory though.
		It&apos;s an oversight by the author, but if I need help figuring out how to do it without the front end (I don&apos;t have a front end installed, nor do I know which to install to match up with what the book is doing), I know where to get the information I need.
		There&apos;s no actual need to set the working directory in most cases though; you can simply specify full file paths instead of relative file paths.
	</p>
	<h4>2018-09-14</h4>
	<p>
		I wrote up my initial discussion post for the week.
		It really got me thinking about how ineffective letter grades are.
		Precision is irreversibly lost when converting continuous numeric grades into finite letter grades.
	</p>
	<h4>2018-09-15</h4>
	<p>
		I spent my time today working on reading material for my other course, so I didn&apos;t get much done in this course.
		I did get a post submitted in the discussion board though.
	</p>
	<h4>2018-09-16</h4>
	<p>
		I didn&apos;t spend much time on coursework today.
		I was dealing with body image issues.
		I should be fine tomorrow.
	</p>
	<p>
		I did get a couple discussion board posts made before midnight though.
		Not the university&apos;s midnight, but my own time zone&apos;s midnight.
		I use the university&apos;s time zone when referencing the university&apos;s imposed deadlines, but I use my own time zone when calculating the deadlines I impose myself, such as my goal of making at least one post per day.
	</p>
	<p>
		There&apos;s a good chance I won&apos;t get much done in this course this week until the final day, which is my only day off.
		one of the two reading assignments in my other course is forty-seven pages long, so it&apos;s taking a while to finish.
	</p>
	<h4>2018-09-17</h4>
	<p>
		Today was spent reading material from my other course.
	</p>
	<h4>2018-09-18</h4>
	<p>
		Today was also spent reading material from my other course.
	</p>
	<h4>2018-09-19</h4>
	<p>
		While completing the assignment for the week, I wanted to get as clean of R output examples as I could.
		For that reason, whenever I needed to copy some output, I started an entirely new R session and re-imported the database we were working with.
		While doing this, I stumbled upon the option to save R sessions from the console directly.
		Last week, we were told the option would show itself if we tried to close the console window.
		This doesn&apos;t work when using an actual console though; it only works when using certain console-like applications.
		On a real console, there&apos;s no way for a running program to hook into the case of the user trying to close the console window.
		The program is simply terminated.
		It turns out that R gives the option when the <code>quit()</code> command is run too though.
		Will that be useful in this course?
		I&apos;m not sure.
		If I need to save a session though, I now know how.
	</p>
	<p>
		I finished up the discussion for the week today, but I didn&apos;t have time for the ungraded quiz this week.
		The reading assignment in my other course was just too long and took up too much of my time.
	</p>
	<h3>The <code>length()</code> function</h3>
	<p>
		The <code>length()</code> function returns the length of any object for which a length-getting method has been defined.
		This includes, but is not limited to, vectors and factors.
		The length returned will be an &quot;unsigned integer&quot;, except that its actual data type will be defined as a signed float or double, depending on how big the number needs to be to correctly identify the length.
		R doesn&apos;t seem to have an actual integer data type, signed or unsigned, so floats and doubles need to be used instead.
	</p>
	<h2>Variance</h2>
	<p>
		Variance is the amount a subject differs from the average of the population or sample.
		Though I&apos;m unsure of the logic behind it, the average variance is calculated by averaging the squared difference of each subject from the average.
		Why is this squared?
		I get that some differences will be negative and we want to avoid bizarre effects of that, but that can be achieved using absolute values instead of squares.
		By instead squaring, we cause outliers to have a much bigger impact on our average variance.
	</p>
	<div class="APA_references">
		<h2>References:</h2>
		<p>
			Yakir, B. (2011, March). Introduction to Statistical Thinking (With R, Without Calculus). Retrieved from <a href="https://my.uopeople.edu/mod/resource/view.php?id=155119"><code>https://my.uopeople.edu/mod/resource/view.php?id=155119</code></a>
		</p>
	</div>
	<h3>More time to study</h3>
	<p>
		The learning journal assignment asks us to identify where we can fit in three more hours of studying.
		I could make up some lies to pass the requirement, but let me be real with you here.
		I spend practically all my free time on coursework.
		I have a day job, I need groceries to live, and I have other errands that need doing.
		Sometimes, something comes up and my family needs me to do something.
		I don&apos;t own a television, I don&apos;t go to the movies, I&apos;m single and don&apos;t go on dates, and I don&apos;t hand out with friends outside of work.
		I mentioned goals I&apos;m working on last week, and how I&apos;m putting another aside for the time being.
		Care to guess what two goals I&apos;ve been focussing on since then?
		It&apos;s coursework completion and improving my typing skills.
		Is for my typing practice, I&apos;ve never been one to waste time producing garbage.
		I don&apos;t spend time typing seemingly-random strings of characters like teachers have schoolchildren do when they&apos;re starting out.
		I&apos;m typing things I&apos;d need to type anyway: my coursework and my journal entries.
		My typing practice isn&apos;t detracting from my coursework time.
	</p>
	<p>
		The department of motor vehicles here refuses to issue drive tests to people without telephone service, so even though I know how to drive, I can&apos;t get a driver license and a car.
		Even if I have a driver licence, it&apos;d take quite a while to save up for an electric car; I refuse to run one of those fossil fuel guzzlers most people use.
		So I&apos;ve got to do everything by bike.
		People underestimate how long it takes to do things by bike, too.
		Take grocery shopping for example.
		You probably head to the grocery store no more than once per week, if even that often, and stock up on what you need.
		Your car will hold it all, right?
		I can&apos;t carry all that on my back.
		Not only do I lake longer to make the trip, I have to make that trip a few times each week.
		I don&apos;t have a bunch of extra time to spare.
		I&apos;m very busy!
	</p>
	<p>
		Once each week, on Thursdays, I meet with a Linux user group to chill out and relax.
		These weekly meetings are the only recreation I have, and I&apos;m not giving them up.
		Outside of work, these are the only social interactions I have.
		And let&apos;s face it: though I do get social interactions at work, no one there actually cares about me or my life in any meaningful way.
		I don&apos;t actually have time in my life for real friends, and likely won&apos;t until I&apos;ve completed my degree.
		The Linux user group is the closest thing I really have to friends.
		We share common interests and help one another solve problems.
		I&apos;m sure you understand it&apos;s not fair to ask me to give that up.
		If I give that up, the only thing I&apos;m likely to achieve ins mental and emotional burnout.
		And then I won&apos;t be getting hardly any coursework done.
		A few hours of socialising once per week really isn&apos;t too much to ask.
	</p>
	<p>
		Like several assignments I&apos;ve encountered at this school, this assignment assumes a life style that I don&apos;t actually have.
		It assumes that I don&apos;t already give my coursework my all; that I have more time to spare.
		I don&apos;t.
		I really don&apos;t.
		Last term, I encountered an assignment asking me what I could give up to be more green.
		I&apos;ve already given up animal products, which are one of the largest contributors to greenhouse gasses.
		As I said before, I don&apos;t drive, but bike instead.
		I recycle, I conserve water and power, and I even get as much of my stuff second hand as I can.
		I&apos;m doing my best.
		Within feasibility, there&apos;s nothing more I can do.
	</p>
	<p>
		As for needing more time in coming weeks, I&apos;ll fit in what I need to.
		Combined, I give my courses as much as I can.
		However, when necessary, I can take time from one course to give to another.
		This week, my other course took most of my time because of its huge reading assignment.
		When this course takes more time, likely the other course will be at a slower point.
	</p>
</section>
<section id="Unit3">
	<h2>Unit 3</h2>
	<h3>Daily reflection</h3>
	<h4>2018-09-20</h4>
	<p>
		I&apos;m a bit behind from last week, given that huge reading assignment in my other course.
		As such, I didn&apos;t get much done in class today.
		I did get my initial discussion post in though.
		I have a feeling I&apos;ll be submitting many of my posts in this course early in the week this week, to buy me time on my other course&apos;s discussion assignment.
	</p>
	<h4>2018-09-21</h4>
	<p>
		Today was spent focusing on my other course as well.
		I did make sure to get three discussion posts in though.
	</p>
	<h4>2018-09-22</h4>
	<p>
		Once more, I focussed on my other course today, and only got a discussion post in for this course.
		I&apos;ve finished the other course&apos;s reading material for the week though, so I should have time to work on this course&apos;s reading material today.
	</p>
	<h4>2018-09-23</h4>
	<p>
		Today, I read the assigned material in this course for the week.
		I thought it&apos;d take a day or two, but the chapter was rather short.
	</p>
	<p>
		We covered histograms, which are rather useful.
		I think most people refer to them as bar graphs instead though.
		I guess that&apos;s not the technical term for them, but until now, I thought it was.
	</p>
	<p>
		We covered use of the <code>\$</code> operator in R.
		Oddly, we already had to use that operator last week, so we had to figure out how to use it on our own, without the textbook&apos;s help.
		At this point, telling us how to use it does us no good.
	</p>
	<p>
		We also covered box plots.
		I&apos;ve never been convinced that box plots are a good way to display data, but maybe that&apos;s just me.
		I&apos;m just not sure what we&apos;re supposed to get from seeing the first and third quartile or the median.
		The mean and the mode are both useful, but what do you even get from the median?
		It looks like these types of graphs can help identify outliers, but I feel a simple histogram can do that even better.
	</p>
	<p>
		Lastly, we reviewed standard divinations.
		Just like last week, no explanation is given as to why we square our numbers.
		It doesn&apos;t seem like a good idea, though I assume that with statistics being such an old field, this is a time-tested method that has very strategic motivations behind it.
		Mathematics are meant to be precise, so there&apos;s got to be a good reason for the formula.
	</p>
	<h4>2018-09-24</h4>
	<p>
		I took a look at my learning journal assignment for the week, only to find it was what I&apos;d submitted <strong>*last*</strong> week.
		Last week, I was in a rush near the end because of a huge reading assignment in my other course.
		Oddly, the learning journal assignments are released a day before the one before is due.
		I must&apos;ve reloaded the course index page after the new assignment was out, then in my haste, clicked on the wrong assignment.
		I remember being shocked at the assignment; I thought I&apos;d read it earlier in the week, and now it was drastically different than I remembered.
		It&apos;s no wonder; it was a different assignment!
		This week&apos;s assignment set me off a bit too, so I guess I didn&apos;t verify it was the right one partly because I&apos;d gotten hotheaded.
		That&apos;s all my fault, but that doesn&apos;t make it any less frustrating.
	</p>
	<h4>2018-09-25</h4>
	<p>
		Seeing as I already submitted this week&apos;s learning journal assignment last week, it doesn&apos;t make sense for me to submit it again.
		And in that mix-up, I ended up not submitting the work I was supposed to last week.
		So I wrote up a submission for last week&apos;s learning journal assignment to use in this week&apos;s submission.
		I know it doesn&apos;t at all make up for not getting it submitted correctly last week, but given the situation I&apos;ve gotten myself into, is seems like the only reasonable option.
		I mean, like I said, the alternative is to submit this week&apos;s assignment a second time, and it makes zero sense to do that.
	</p>
	<h4>2018-09-26</h4>
	<p>
		Today was spent finishing up a few last-minute tasks.
		I started with the grading of last week&apos;s work, and found grading instructions to be wrong.
		One of the questions asked us a multiple-choice question about a bar graph.
		We were told to identify whether the bar in question identified the mean, the median, or the mode of the dataset.
		However, the grading rubric said that the bar instead represented that three of the sampled flowers had a sepal width of three.
		That wasn&apos;t even one of the options.
		And like I said, the question was multiple choice.
		There was no option to specify a different answer.
	</p>
	<p>
		Next, I took a look at the ungraded quiz.
		It seemed very complicated.
		I&apos;m sure I could have passed it, given more time, but Much of my time this week was spent catching up from last week, due to the overly-long reading assignment last week in my other course.
		So I ended up skipping the ungraded quiz.
	</p>
	<p>
		Oddly enough, the graded quiz for the week was far simpler.
		My one complaint is that near the end, we were asked to fill out the missing values in a table, being told we&apos;d need them, but then we only used two of those values.
		Having put in the effort to fill out the table, it&apos;d be nice to have needed at least half the found values.
	</p>
	<p>
		Lastly, I submitted my final discussion post for the week.
	</p>
	<p>
		If I can get the head start tomorrow that I need for the week, the coming week should go a lot smoother.
		If I can&apos;t, maybe I&apos;ll try to get someone to cover a shift at work for me so I can get an extra study day.
	</p>
	<h3>Definitions</h3>
	<p>
		Frequency is the number of times a particular value occurs in our data set (Yakir, 2011).
		It&apos;s a raw number that doesn&apos;t tell us much on its own.
		&quot;Okay, so we see the value <code>5</code> in our data set three times, but what does that mean?&quot;
		Relative frequency, on the other hand, is the number of times a given value occurs divided by the total number of data points (Yakir, 2011).
		It&apos;s a fraction; a decimal; a percentage.
		It gives us some actual perspective on how often the given value occurs in a way that we can draw conclusions about our population given our sample.
	</p>
	<h2>R session</h2>
	<blockquote>
<pre><code>R version 3.3.3 (2017-03-06) -- &quot;Another Canoe&quot;
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type &apos;license()&apos; or &apos;licence()&apos; for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type &apos;contributors()&apos; for more information and
&apos;citation()&apos; on how to cite R or R packages in publications.

Type &apos;demo()&apos; for some demos, &apos;help()&apos; for on-line help, or
&apos;help.start()&apos; for an HTML browser interface to help.
Type &apos;q()&apos; to quit R.

&gt; getwd()
[1] &quot;/home/yst&quot;
&gt; setwd(&quot;Downloads&quot;)
&gt; getwd()
[1] &quot;/home/yst/Downloads&quot;
&gt; ex.1 &lt;- read.csv(&quot;ex1.csv&quot;)
&gt; summary(ex.1)
       id              sex         height     
 Min.   :1538611   FEMALE:54   Min.   :117.0  
 1st Qu.:3339583   MALE  :46   1st Qu.:158.0  
 Median :5105620               Median :171.0  
 Mean   :5412367               Mean   :170.1  
 3rd Qu.:7622236               3rd Qu.:180.2  
 Max.   :9878130               Max.   :208.0  
&gt;</code></pre>
	</blockquote>
	<div class="APA_references">
		<h3>References:</h3>
		<p>
			Yakir, B. (2011, March). Introduction to Statistical Thinking (With R, Without Calculus). Retrieved from <a href="https://my.uopeople.edu/mod/resource/view.php?id=155119"><code>https://my.uopeople.edu/mod/resource/view.php?id=155119</code></a>
		</p>
	</div>
</section>
<section id="Unit4">
	<h2>Unit 4</h2>
	<h3>Daily reflection</h3>
	<h4>2018-09-27</h4>
	<p>
		I couldn&apos;t get started right away today, as I had errands I&apos;d neglected to run previously that needed to be taken care of before tonight&apos;s meeting, but I did get the reading material completed before I had to leave for the meeting.
	</p>
	<p>
		The books makes the claim that abstract variance will be useful in the lives of people taking the course, and goes on for a good while trying to convince us of that.
		This assumes though that statistics in general is even useful to takers of the course.
		For me, both statistics and abstraction are incredibly useful, but then again, I&apos;m an aspiring programmer.
		Everything I work with is an abstraction of numbers, and those numbers are an abstraction for on and off signals in the machine.
		I have no doubt that I&apos;ll find use for abstract statistics.
		This is a required course for graduation though.
		We&apos;re here not because the course material will pertain to our lives in any way, but because we need to be here to earn our degrees.
		I&apos;m not convinced everyone in the course will get as much out of this as I will.
		Some people here probably won&apos;t have a use for it at all.
	</p>
	<p>
		Then again, abstraction is a very useful skill, and is one of the major components in being human.
		We abstract many times every day, usually without even noticing it.
		For example, you just paid fifty dollars for a load of groceries?
		Nope.
		No, you didn&apos;t.
		Dollars don&apos;t exist.
		For the groceries, you just traded several pieces of paper that the government <strong>*told*</strong> you have certain values, but the unit of value doesn&apos;t have any meaning beyond the meaning people choose to give it.
		In making a purchase, you just abstracted.
		Even if people don&apos;t find use for the statistical part of the lesson, maybe the abstraction practice is good for something.
	</p>
	<p>
		The book continued, explaining mean and divination again, only for it to compare the &dout;definition of the [mean/deviation] of a sample&quot; to the &dout;definition of the [mean/deviation] of a population&quot;, as if these uses of the words &quot;mean&quot; and &quot;deviation&quot; were somehow different.
		Every finite numeric set has a mean and a deviation value though, and these words mean the same thing in every case.
		It&apos;s not some separate definition that needs to be explained.
		The same goes for standard diviation, which was explained as a &quot;new&quot; thing next.
		After that, we moved on from talking about populations without actually talking about how we were supposed to abstract them.
	</p>
	<p>
		So-called &quot;random variables&quot; seem to be a combination of algebra, set theory, and probability distribution.
		While we covered what they are and saw examples of what you could do with them, we didn&apos;t see any useful cases for them.
		It looks like we&apos;ll be reading more about them over the next couple weeks though, so that&apos;s something to look forward to.
		I think they could be used for interesting and/or useful things, I&apos;m just not sure what.
	</p>
	<p>
		We got our first look at type-casting in R, if you don&apos;t count the casting of floats (&quot;numerics&quot;) to strings (&quot;factors&quot;).
		It seems the <code>mean()</code> function is also able to case booleans into floats.
		Or perhaps that&apos;s a feature of the language itself, and all functions that expect numbers can handle booleans in this way.
	</p>
	<p>
		Lastly, I got my discussion post for the day written up and submitted.
	</p>
	<h4>2018-09-28</h4>
	<p>
		The reading material in my other course looks like it&apos;s going to take a few days again.
		I mostly focussed on that.
		I did get a discussion post for this course submitted though.
	</p>
	<h4>2018-09-29</h4>
	<p>
		I spent today working on work for my other course.
	</p>
	<h4>2018-09-30</h4>
	<p>
		Again, I focussed on my other course today.
	</p>
	<h4>2018-10-01</h4>
	<p>
		I started work on the main assignment for the week, though I wasn&apos;t able to finish it today.
		It&apos;s unlikely I&apos;ll be able to work on it tomorrow, as I&apos;ll be visiting with my mother.
	</p>
	<p>
		I got a few discussion posts completed and handed in.
	</p>
	<h4>2018-10-02</h4>
	<p>
		I got started on my main assignment for this week.
		The beginning of the instructions confused me at first, and I thought there was a Task 1-8 (&quot;one dash eight&quot;) I needed to find somewhere.
		It didn&apos;t seem to exist in the textbook or anywhere else though.
		It was actually tasks 1-8 (&quot;one through eight&quot;), which would be explained further down in the instructions.
	</p>
	<h4>2018-10-03</h4>
	<p>
		This week was hectic.
		I finished everything I needed to with only five minutes to spare.
		Ugh.
		I think my main assignment submission in this course suffered a little due to the time crunch.
		My submission for my other course suffered immensely though.
	</p>
	<p>
		Today, I finished up the discussion assignment and the main assignment, and got both submitted.
	</p>
	<h3>x̄ and μ</h3>
	<p>
		x̄, pronounced &quot;ex bar&quot;, represents the average of the sample data represented by x.
		μ, pronounced &quot;mew&quot; and spelled out as &quot;mu&quot;, represents the mean of an entire population.
		There are two main differences between the two.
		The first is that x̄ only relates to the sample taken, and not the data from the entire population that we&apos;re interested in.
		This means that for different samples from the same population, we will very likely have some variance between each sample&apos;s x̄s.
		Second, x̄ is feasibly calculable.
		By definition, x̄ relates to whatever sample we have.
		It relates to data that we not only can get, but have already gotten.
		On the other hand, μ relates to the data of the entire population, which in most useful cases, isn&apos;t a data set we can fully acquire.
		We can make inferences about μ based on what we know about x̄, but we usually can&apos;t work with, look at, or measure μ itself in any way.
	</p>
	<h3>weighted mean</h3>
	<p>
		The typical way of finding the mean of a set of numbers is to average them out.
		However, in some cases, the different numbers come up at different frequencies.
		Instead of the mean, we&apos;re more interested in the weighted mean.
		(The book doesn&apos;t call it a weighted mean, it just calls it a mean, but the method shown by the book for solving such problems actually shows how to find the weighted mean.)
		To find the weighted mean, you simply multiply each number by its weight (which in this case is the number&apos;s relative frequency), then add all the results together.
		For example, if we have the data set <code>21</code>, <code>22</code>, <code>34</code>, <code>48</code>, <code>49</code>, <code>58</code>, <code>63</code> with respective weights <code>0.5</code>, <code>0.25</code>, <code>0.125</code>, <code>0.0625</code>, <code>0.03125</code>, <code>0.015625</code>, <code>0.015625</code>, we&apos;d have a weighted mean of about <code>26.67188</code>.
		We can verify this with the following R session:
	</p>
	<blockquote>
<pre><code>R version 3.3.3 (2017-03-06) -- &quot;Another Canoe&quot;
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type &apos;license()&apos; or &apos;licence()&apos; for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type &apos;contributors()&apos; for more information and
&apos;citation()&apos; on how to cite R or R packages in publications.

Type &apos;demo()&apos; for some demos, &apos;help()&apos; for on-line help, or
&apos;help.start()&apos; for an HTML browser interface to help.
Type &apos;q()&apos; to quit R.

&gt; 21*0.5+22*0.25+34*0.125+48*0.0625+49*0.03125+58*0.015625+63*0.015625
[1] 26.67188
&gt;</code></pre>
	</blockquote>
	<p>
		All this does is multiply each probability by its weight and add the results together.
		Because the result isn&apos;t assigned to a variable, R just spits out the result onto the command line.
		I did some testing, and it seems R is rounding to only five digits.
		Modifying the smaller weights doesn&apos;t seem to affect the answer.
		In other words, the given answer is only approximate.
	</p>
</section>
<section id="Unit5">
	<h2>Unit 5</h2>
	<h3>Daily reflection</h3>
	<h4>2018-10-04</h4>
	<p>
		I burnt out yesterday from stress, despite my best efforts.
		As a result, I got nothing done today.
		I didn&apos;t even attend my weekly Thursday meeting.
		For the next five days, I&apos;ve got shifts at work, so there is no catching back up unless I do something right now.
		I&apos;ll pull an all-nighter and get back to where I need to be.
		I&apos;ll probably read all the assigned material for both my courses before I get any sleep, as well as hopefully the bulk of both my learning journal assignments.
		From there, I&apos;ll be able to manage my written assignments throughout the remaining workday.
		Tomorrow&apos;s work shift is going to be horrible, but I don&apos;t see any other options.
		That&apos;s just what happens when you fail to stay strong, and instead give in to stress.
	</p>
	<h4>2018-10-05</h4>
	<p>
		I wasn&apos;t able to get as much done as I&apos;d planned to, but I got a good chunk done, and if I can stay focussed, I should do fine.
		I&apos;ve got a bit more reading to do, as well as the exercise for the learning journal assignment, but I got all the reading material completed for my other course (at least the parts not hosted on websites that maliciously block me based on my $a[IP] address, which is all I&apos;m going to be able to access anyway) and gotten that other learning journal assignment done, so I should have plenty of time to finish up in this course by week&apos;s end.
	</p>
	<p>
		Today as I started reading about random variables again, they reminded me a lot of when I tried to study quantum mechanics.
		I couldn&apos;t wrap my head around most of it, but the main thing that stuck with me was that particles exist in a superposition until you measure them.
		They&apos;re in multiple places at once until then.
		This is sort of like how random variables work.
		They represent a multitude of potential answers, and once you make your observation, one of those possibilities becomes reality, while the others just fall away.
	</p>
	<p>
		The book mentions that when dealing with discreet random variables, each possible value is separated by some amount of space from the neighbouring possibilities.
		This is certainly true, but the book doesn&apos;t mention why this is.
		To put it simply, the defining characteristic of discrete random variables is that specific, discreet values are used.
		There are no ranges, only specific values.
		No two numeric values can ever touch; there is always an infinite number of values between them.
		The only way to have consecutive, unseparated values is to include not discreet values, but a range.
	</p>
	<p>
		The concept of binomials is discussed in depth by the book, but to computer scientist such as myself, the concept is already familiar.
		Basically, a binomial is a random variable based on the number of a specific boolean coming up, given a number of attempts.
		Computer scientists typically use <code>true</code> and <code>false</code> as the names for booleans, but any pair of labels will do.
		The book talks about using &quot;success&quot; and &quot;failure&quot; as the boolean labels, which works just as well.
	</p>
	<p>
		Poisson random variables on the other hand have an unlimited number of discreet values.
		This seems to me to be very difficult to model.
	</p>
	<h4>2018-10-06</h4>
	<p>
		There wasn&apos;t much time today, but I did get the grading for the week done.
	</p>
	<h4>2018-10-07</h4>
	<p>
		I submitted my discussion post for the day.
	</p>
	<h4>2018-10-08</h4>
	<p>
		I got back to the reading material today and finally finished it.
		Working with continuous random variables is a bit strange.
		Because you don&apos;t have discrete values, you have to work with ranges.
		The book doesn&apos;t cover this, but that necessarily means you have a level of imprecision involved.
		By choosing smaller and smaller ranges to look at the probabilities for, you get more and more accurate results.
		If you instead use large ranges, you get very rough estimates.
	</p>
	<p>
		If you&apos;re working with a uniform distribution, none of what I said above applies.
		Everything happens with equal probability, so there is way to find the most probable outcome, as all outcomes are equally probable.
		As for the weighted mean, it&apos;s just the central value; the median.
		A formula for finding the variance of such a distribution is given, but no explanation for it is given whatsoever.
		I can plug numbers into this formula of course, but with no context, I have no idea what the answers found actually mean.
	</p>
	<p>
		Neither the expectation nor variance formulas for exponential distributions were explained.
		We were simply handled formulas and told to use them.
		Without an explanation of why those formulas are to be used though, we&apos;re not really learning anything; we&apos;re just memorising things we don&apos;t actually understand.
	</p>
	<h4>2018-10-09</h4>
	<p>
		I finished up the learning journal exercises and made my discussion post for the day.
		Only a learning journal post remains for tomorrow.
	</p>
	<h4>2018-10-10</h4>
	<p>
		I completed the discussion assignment for the week by making two final posts.
		I had everything completed at a decent pace this week, despite my poor start at the beginning of the week.
	</p>
	<h3>Definitions</h3>
	<p>
		An exponential distribution is a type of continuous distribution in which there is a high probability of hitting zero, but the probability of a given number quickly tapers off as the number rises (Yakir, 2011).
		It does this along a line that looks like it&apos;s drawn according to an exponential equation, though the book didn&apos;t mention the format of these equations, instead telling us to delegate the mathematics to R&apos;s functions.
		Exponential distributions are used when something with a low number is likely to occur, but the probability gets exponentially smaller as the number gets larger.
		However, this sort of distribution is only use for continuous values, not discrete values, as discrete values are handled by poisson distributions instead.
		An example of when I&apos;d use an exponential distribution would be how long I have to stay at work after my scheduled shift.
		Normally, I leave right about on time.
		Sometimes things run a little late though.
		On even rarer occasions, the boss gives me an extra hour or more.
		There comes a point when we&apos;ve got to close the store though, and I won&apos;t be there after that.
		The longer the amount of added time, the less likely I&apos;ll see it added to the end of my shift.
	</p>
	<p>
		When dealing with a binomial distribution, a binary option is presented along with a probability of &quot;success&quot;.
		A number of tries is considered, and the distribution represents the probability of each number of success cases one might see (Yakir, 2011).
		In such distributions, each trail has to be completely isolated from the other trials.
		A given success or failure cannot change the probability used for any other trial measured.
		An example of this would be encountering a rare item in a computer game.
		There&apos;s a game I play when I have time (which sadly isn&apos;t often, these days) called Minetest.
		It&apos;s basically a sandbox game where you build out of giant construction cubes, known to player as &quot;nodes&quot;.
		The world itself was sectioned into &quot;blocks&quot;, which are 16<sup>3</sup>-node spaces.
		For a short period of time, the game featured a rare node called the PB&amp;J Pup.
		It served no purpose in the game but to act as a trophy for players to discover.
		The code that spawned the pups was a bit more complicated than what I&apos;ll describe, but if you read the code, it was clear that one pup was supposed to spawn in about every one thousand blocks.
		(Due to an oversight, some blocks had a double chance of spawning a pup, and for simplicity in the code, one block&apos;s pup might spawn instead in another block, but for this example, we&apos;ll ignore both issues.)
		I tend to build long tunnels for fun.
		My tunnels are wide and tall, so they take up entire blocks that they pass through.
		Until PB&amp;J Pups were removed from the game, this gave me a one in one thousand chance of digging up one of these little treasures with each new block I dug through.
		The odds were not in my favour, but the probability of finding a pup in each block (if we ignore the technical implementation issues I described) didn&apos;t change from block to block, and finding (or not finding) a pup in any given block didn&apos;t affect my chances of finding one in any other block.
		With that in mind, we can look at the number of pups I was likely to find, given the number of blocks I tunnelled through.
		This unknown-until-I-try-it number would be represented by a random variable with a binomial distribution.
	</p>
	<h3>Using R</h3>
	<h4>Case A</h4>
	<blockquote>
<pre><code>R version 3.3.3 (2017-03-06) -- &quot;Another Canoe&quot;
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type &apos;license()&apos; or &apos;licence()&apos; for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type &apos;contributors()&apos; for more information and
&apos;citation()&apos; on how to cite R or R packages in publications.

Type &apos;demo()&apos; for some demos, &apos;help()&apos; for on-line help, or
&apos;help.start()&apos; for an HTML browser interface to help.
Type &apos;q()&apos; to quit R.

&gt; ?pbinom
&gt; pbinom(q=5, size=10, prob=1/6)
[1] 0.9975618
&gt;</code></pre>
	</blockquote>
	<p>
		The <code>pbinom()</code> function outputs the cumulative probability of a binomial.
		What this means is that we can use it to determine the probability of the binomial random variable turning out to be at most whatever value we specify.
		<code>q</code> is typically a vector of the values we want to display cumulative probabilities for.
		In the example here though, we used a single number instead of a vector.
		The <code>pbinom()</code> function gracefully handles this by treating it as a vector containing only the one number we specified.
		<code>size</code> is the number of trials we want to use in our calculation.
		The <code>prob</code> value is a vector of probabilities for each trial&apos;s success.
		In our example, we used a single number instead of a vector, but <code>pbinom()</code> again gracefully treats this as a vector of only one value.
	</p>
	<p>
		Putting that all together, the code above gives us the probability that we&apos;re getting R to calculate the probability of getting five or fewer success cases out of ten when there is a one sixth probability of success each time.
	</p>
	<h4>Case B</h4>
	<blockquote>
<pre><code>R version 3.3.3 (2017-03-06) -- &quot;Another Canoe&quot;
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type &apos;license()&apos; or &apos;licence()&apos; for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type &apos;contributors()&apos; for more information and
&apos;citation()&apos; on how to cite R or R packages in publications.

Type &apos;demo()&apos; for some demos, &apos;help()&apos; for on-line help, or
&apos;help.start()&apos; for an HTML browser interface to help.
Type &apos;q()&apos; to quit R.

&gt; n=10
&gt; p=.5
&gt; x=9
&gt; pbinom(x, n, p)
[1] 0.9990234
&gt;</code></pre>
	</blockquote>
	<p>
		Here, we&apos;re basically doing the same thing again, but with different numbers.
		Also, we&apos;re assigning the values to variables before popping them into the <code>pbinom()</code> function.
		This time, we&apos;re looking for the odds of getting no more than nine out of ten successes when there is a 50% probability of getting a success each time.
	</p>
	<h4>Case C</h4>
	<blockquote>
<pre><code>R version 3.3.3 (2017-03-06) -- &quot;Another Canoe&quot;
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type &apos;license()&apos; or &apos;licence()&apos; for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type &apos;contributors()&apos; for more information and
&apos;citation()&apos; on how to cite R or R packages in publications.

Type &apos;demo()&apos; for some demos, &apos;help()&apos; for on-line help, or
&apos;help.start()&apos; for an HTML browser interface to help.
Type &apos;q()&apos; to quit R.

&gt; ?punif
&gt; punif(5, min=1, max=10) - punif(4, min=1, max=10)
[1] 0.1111111
&gt;</code></pre>
	</blockquote>
	<p>
		<code>punif()</code> is used to find the cumulative distribution at a given point in a uniform distribution.
		In the code example above, the distribution spreads all odds evenly from <code>1</code> to <code>10</code>, then finds the probability of the result being <code>5</code> or less.
		It then finds the odds in the same distribution of the discovered value being four or less.
		Subtracting these, it tells us the probability of the found value being greater than four but less than or equal to five.
	</p>
	<div class="APA_references">
		<h3>References:</h3>
		<p>
			Yakir, B. (2011, March). Introduction to Statistical Thinking (With R, Without Calculus). Retrieved from <a href="https://my.uopeople.edu/mod/resource/view.php?id=155119"><code>https://my.uopeople.edu/mod/resource/view.php?id=155119</code></a>
		</p>
	</div>
</section>
<section id="Unit6">
	<h2>Unit 6</h2>
	<h3>Daily reflection</h3>
	<h4>2018-10-11</h4>
	<p>
		I started and completed the reading assignment, then submitted my discussion post for the day.
	</p>
	<p>
		Z-scores are confusing, and I&apos;m not sure what function they provide to us.
		From the looks of it, they&apos;re mainly used as a roundabout way to find probabilities that we could have found easier without them.
		Perhaps later units will elaborate on why we should use them at all.
	</p>
	<p>
		Default arguments in the context of R functions were finally explained by the textbook, after already having dealt with them in past units.
		As a programmer, I gave it no thought at the time.
		However, now that they&apos;re actually being covered, I can&apos;t help but feel this was likely confusing for some of the non-programmers in the course until this week.
	</p>
	<h4>2018-10-12</h4>
	<p>
		Today was spent working on work for my other course.
	</p>
	<h4>2018-10-13</h4>
	<p>
		I completed the learning journal exercise today.
		Or my attempt at it, anyway.
		The final part was too vague to form a real answer to.
		It asked the following:
	</p>
	<blockquote>
		<p>
			If you know the mean and standard deviation of a normally distributed population, and somebody asks you questions about the highest 1% and lowest 1% of numbers in that population, what could you tell them?
		</p>
	</blockquote>
	<p>
		If they asked us questions, what could we tell them?
		Well, that&apos;d depend entirely on what questions they asked!
		I&apos;m completely at a loss as to what this part of the assignment is after.
		Maybe this can be fixed for later iterations of the course, so it&apos;ll actually ask what it means to ask.
	</p>
	<p>
		I also wrote up a couple discussion posts.
	</p>
	<h4>2018-10-14</h4>
	<p>
		Yesterday, my plan for today was to complete the main written assignment.
		Looking at it this morning, I knew that wasn&apos;t a realistic goal.
		A better plan would be to try to get it done today and tomorrow.
		I&apos;ve got errands the next day though, then I&apos;ll be visiting with one of my siblings on the day after that.
		It&apos;s going to be a tight squeeze getting all my coursework into what time I have this week.
		I might need to cut the visit short, but I&apos;m sure they&apos;ll understand.
	</p>
	<h4>2018-10-15</h4>
	<p>
		I ended up not having time to work on the main assignment today, which isn&apos;t good.
		Wednesday is going to be hectic.
		I did submit a few discussion posts though.
	</p>
	<h4>2018-10-16</h4>
	<p>
		Today was Tuesday.
		I pretty much never have time for much coursework on Tuesdays.
		I got a little done, but tomorrow is when the mad scramble will have to occur.
	</p>
	<h4>2018-10-17</h4>
	<p>
		Two of the questions in the quiz today didn&apos;t have enough information to actually be solved, assuming I know what I&apos;m doing in the slightest.
		We were given the expectation for a binomial distribution.
		However, we were <strong>*not*</strong> given the number of trials or the probability of success.
		With either one, we could have figured out the other.
		Likewise, no other data such as a standard deviance so we could solve the problem the hard way.
		Aside from the expectation, the mean, we were given <strong>*nothing*</strong>.
		And with this nothing, we were asked to solve these problems:
	</p>
	<blockquote>
		<p>
			What is the probability of getting exactly 13 successes in experiment B?
		</p>
	</blockquote>
	<blockquote>
		<p>
			What would be the probability of getting MORE THAN 15 successes in a experiment B?
		</p>
	</blockquote>
	<p>
		I hope I&apos;m not just being a complete moron and that there really wasn&apos;t enough information provided for these problems.
	</p>
	<p>
		Finally, I completed the main assignment and submitted my last few discussion posts for the week.
	</p>
	<h3>Normal distribution</h3>
	<p>
		The normal distribution is a continuous distribution with probabilities lying along a bell-shaped curve (Yakir, 2011).
		The highest odds are clustered about the expectation, and the lesser probabilities taper off into infinity and negative infinity.
		Absolutely every real number is included with some probability in this distribution.
		The <code>pnorm()</code> function is used to calculate the cumulative probability at a given point in a given normal distribution (Yakir, 2011).
	</p>
	<h3>Vague</h3>
	<p>
		The learning journal asks us this question:
	</p>
	<blockquote>
		<p>
			If you know the mean and standard deviation of a normally distributed population, and somebody asks you questions about the highest 1% and lowest 1% of numbers in that population, what could you tell them?
		</p>
	</blockquote>
	<p>
		If we were given the questions they&apos;d asked, we could certainly answer them, but we&apos;re not sure what this hypothetical person is asking for.
		Likewise, the the book doesn&apos;t seem to mention any particular properties of the one percentile and ninety-nine percentile points.
		The only thing that comes to mind to tell them in the learning journal&apos;s incredibly-vague scenario is that thinks lying beyond these points are statistical improbabilities.
		In 98% of all situations, a value between these points, not beyond them, will be chosen.
	</p>
	<p>
		If we&apos;re working with hard numbers, we can find the location of these points.
		Again, we don&apos;t know that the location of these points is even what the hypothetical asker wants to know though.
		The <code>qnorm()</code> function will give us those answers.
		We just need to run <code>qnorm(0.01, mean, sd)</code> and <code>qnorm(0.99, mean, sd)</code>, where <code>mean</code> is the expectation (the mean and the expectation are equal in a distribution) and <code>sd</code> is the standard deviation.
		With a mean of <code>7</code> and a standard divination of <code>2</code>, we get the locations of these percentiles using the following R session:
	</p>
	<blockquote>
<pre><code>R version 3.3.3 (2017-03-06) -- &quot;Another Canoe&quot;
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type &apos;license()&apos; or &apos;licence()&apos; for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type &apos;contributors()&apos; for more information and
&apos;citation()&apos; on how to cite R or R packages in publications.

Type &apos;demo()&apos; for some demos, &apos;help()&apos; for on-line help, or
&apos;help.start()&apos; for an HTML browser interface to help.
Type &apos;q()&apos; to quit R.

&gt; qnorm(0.01, 7, 2)
[1] 2.347304
&gt; qnorm(0.99, 7, 2)
[1] 11.6527
&gt;</code></pre>
	</blockquote>
	<p>
		We find that the first percentile is at about <code>2.347304</code> and the ninety-ninth is at about <code>11.6527</code>.
	</p>
	<div class="APA_references">
		<h3>References:</h3>
		<p>
			Yakir, B. (2011, March). Introduction to Statistical Thinking (With R, Without Calculus). Retrieved from <a href="https://my.uopeople.edu/mod/resource/view.php?id=155119"><code>https://my.uopeople.edu/mod/resource/view.php?id=155119</code></a>
		</p>
	</div>
</section>
<section id="Unit7">
	<h2>Unit 7</h2>
	<h3>Daily reflection</h3>
	<h4>2018-10-18</h4>
	<p>
		I&apos;ve done horribly this term.
		I wasn&apos;t prepared for the first week, and I feel like I&apos;ve been paying for it ever since.
		I&apos;ve been scrambling to catch up, bleeding each week a bit into the next.
		I&apos;ve learned from the mistakes I made this term, and I&apos;ll do better next term.
		That said, if patterns continue, there shouldn&apos;t be further issue this term.
		We seem to only have major assignments in even-numbered units in this course.
		Next week is unit eight though, and there&apos;s no peer grading in unit nine, so there typically isn&apos;t an assignment beyond the learning journals and the discussion assignments in unit eight at this school.
		I did have one unit eight essay once, and I poured my heart into it, only to find out later that it wasn&apos;t even a graded assignment.
		It was a bit of a disappointment.
		I think I would have scored really well on it.
	</p>
	<p>
		At first, the reading material for the week had me confused.
		It talked about random samples, which I though to be samples taken randomly, but it quickly explained what was actually going on.
		A random sample is basically a sample-sized collection of random variables.
		The sample hasn&apos;t been taken yet, so the quantum-like state of the sample is still somewhere off in limbo, but you can perform certain operations on it beforehand.
		We can take the average, for example.
		Since we don&apos;t have any hard numbers to work with, the average of this random sample is a random variable instead of a hard number itself.
		I question how useful that is though.
		The book claims we can know the expectation and distribution of this random variable acting as the average, but can we really?
		To do that, we&apos;d need to know the data of the population the sample will be taken from.
		The whole point of taking a sample though is that it&apos;s not feasible or not even possible to get data on the entire population.
		And without that population data, we have no idea what might turn up in the sample.
		I mean, unless we take a different sample first.
		That could give us an idea about what our second sample might have, but again, not all values in the second sample will necessarily be present in the first.
		We can guess at the average of the second sample, but not the individual values or the space of possible values.
		The book recommended taking a large sample of samples and getting statistics from these, but again, you&apos;re only able to estimate characteristics of the population&apos;s possible samples this way; you don&apos;t get the actual sample space.
		I also have to wonder why you wouldn&apos;t simply combine these samples to form a bigger sample with more accurate predictions about the population.
		That is, why take 100 samples with 100 measurements each instead of one sample with 10000 measurements?
	</p>
	<p>
		The book does go on to specify that these tactics apply in abstract cases too though, cases with no real population at all.
		That seems a lot more useful.
		I&apos;m not sure what you&apos;d do with that sort of abstraction, but abstraction pretty much always has something to teach us about our methods.
	</p>
	<p>
		We got into control structures in R this week!
		Well, one control structure: the <code>for</code> construct.
		From the looks of it, it works like $a[PHP]&apos;s <code>foreach</code> construct, iterating over elements in an array/vector/whatever, not like $a[PHP]&apos;s <code>for</code> structure, which loops until a condition is met, typically incrementing a value to be used repeatedly.
		A special vector had to be created in the examples in the textbook specifically for use in the iteration, then thrown away afterwards.
		That seems inefficient.
		What caught my attention more though was that a vector with a certain number of entries needed to be created to store the results of the iteration&apos;s code before the looping began.
		R must not have variable-width vectors.
		Instead, a vector with zeroed-out values was instantiated, then the values corrected later in the loop.
		Again though, as I mentioned in a past unit, R is indexing from one, not zero, which is very counter-intuitive.
		If working with such loops in R, it&apos;s important not to fall for the mistake of doing things the same way as you would in other languages, such as Java, C, Python, $a[PHP], and other common languages.
		You index from zero in these languages, but in R, that just won&apos;t work.
	</p>
	<p>
		The central limit theorem, if I understand it, is kind of interesting.
		The accuracy of the samples grows with the sample size, with the mean of the sample moving toward the central point of the population&apos;s mean.
		It makes perfect sense, but isn&apos;t something I&apos;d&apos;ve thought about on my own.
		What this tells us though is that there comes a point when we&apos;d not get much benefit from adding to our sample size.
		Basically, as we add more data points, each data point tells us less that we didn&apos;t already know.
		The first several data points get us drastically closer to the correct results, while later data points only get us slightly closer.
		I&apos;d still argue that there&apos;d be a usefulness to having data on the entire population, but there&apos;s likely not enough of a difference between sampling 5% of the population and 10% to warrant the extra leg work to get the 10% sample in many cases.
		Doubling the sample size doesn&apos;t even close to double the accuracy of our results.
	</p>
	<p>
		I finished both the reading material and my initial discussion post.
		I wrote up my discussion post before reading, fearing I wouldn&apos;t have time to finish before my meeting today if I didn&apos;t, but found the reading material didn&apos;t really add to my argument anyway.
		It was informative, it just didn&apos;t add enough to what I already understood about the situation described by the discussion assignment to really be worth modifying it.
	</p>
	<h4>2018-10-19</h4>
	<p>
		I got a discussion post handed in for the day, then finished my learning journal activity completed.
		It turns out we were required to write about the law of large numbers, which I already went into detail about in yesterday&apos;s entry, so I just moved that paragraph from yesterday&apos;s entry to the definitions portion of the assignment.
		The rest was exceptionally easy too.
		I guess this will just be an easy week, at least in this course.
		That works out well, as I have stuff going on this week: I have visitors tomorrow, them a doctor&apos;s going to extract broken glass from my foot on Wednesday.
	</p>
	<h4>2018-10-20</h4>
	<p>
		I wasted a lot of time, both last night and this morning, making sure everything was ready for my guests.
		Then they never showed.
		Rude.
		If I&apos;d known they weren&apos;t going to show, I could have spent that time working on coursework instead.
	</p>
	<p>
		Anyway, I did get the grading done for the week.
		The answer key claimed that in continuous distributions, there&apos;s no difference between &quot;less than&quot; and &quot;less than or equal to&quot;.
		I&apos;d love to see the maths that back that statement up.
		Until I do, there&apos;s no way I can believe that.
		We can say the difference is negligible, but not non-existent.
		For example, let&apos;s say we cave a continuous distribution and want to know the probability of finding a value greater than five.
		We&apos;ll call this X
		Then we find the probability of finding a value less than five.
		This will be Y.
		If &quot;less than&quot; and &quot;less than or equal to&quot; mean the same thing in this context, we can use the &quot;less than or equal to&quot; to show that X + Y necessarily equals <code>1</code>.
		If they mean the same thing, we can then use the &quot;less than&quot; variant, meaning that the probability of finding a value less than five plus the probability of finding a value greater than five is equal to <code>1</code>.
		Simplifying this, we get that the probability of finding a value not equal to five is <code>1</code>, meaning that there is <code>0</code> probability of finding a five in our data set.
		We can repeat this process with any specific value, including values we&apos;ve <strong>*already*</strong> seen in our data set.
		Clearly, even in the context of continuous distributions, there&apos;s a difference between &quot;less than&quot; and &quot;less than or equal to&quot;.
		To say otherwise is a blatant lie.
	</p>
	<p>
		I also found I read one of the problems wrong when I was solving it.
		I thought the problem had asked about the probability of a value landing greater than one stand
	</p>
	<p>
		That concludes my work for the week in this course aside form the discussion assignment, which as I mentioned in <a href="#Unit1">Unit 1</a>, I draw out throughout the week due to some censorship issues.
		I still have a big essay in my other course though, so it&apos;d be nice if people wouldn&apos;t set appointments to drop by then just not show.
	</p>
	<h4>2018-10-21</h4>
	<p>
		I wrote up my discussion post for the day.
	</p>
	<h4>2018-10-22</h4>
	<p>
		Since course registration began, I&apos;ve been asking my advisor for help, as the university website won&apos;t let me register for the courses I need.
		They keep ignoring my requests for help, and just repeatedly telling me to register for the coming term, even though I&apos;ve expressed repeatedly that the website won&apos;t let me.
		I finally carbon copied my response to several departments that seemed like they might be in charge of this sort of thing, and that finally prompted the advisor to actually address the problem at hand.
		I spent all day working on this issue to no avail, so I&apos;ll try more tomorrow.
	</p>
	<h4>2018-10-23</h4>
	<p>
		I wrote up my discussion post for the day, and learned a new perspective from another student.
		Work on course registration issues continues.
	</p>
	<h4>2018-10-24</h4>
	<p>
		It&apos;s the final day of the course-registration period, and <strong>*finally*</strong>, the issue was resolved.
		I hope.
		I anticipate the &quot;fix&quot; is going to interfere with graduation next term.
		To top things off though, the courses I need for next term are proctored, and proctors must confirm their willingness to proctor exams by the end of the registration period, which is today.
		I spent all day trying to get in touch with the proctor service to get them to confirm their willingness instead of studying like I&apos;d hoped to.
	</p>
	<h3>Vocabulary</h3>
	<p>
		A sampling distribution is the distribution that represents all possible samples (Yakir, 2011).
		In most cases, the sampling distribution can&apos;t be known because calculating the sampling distribution would require have all relevant knowledge about the population from which the sample was taken.
		If we had all the information about the population, we wouldn&apos;t need to sample in the first place.
		However, just like how a sample gives us information we can use to make inferences about the population, taking a sample of samples (a collection of several samples) can give us useful information for making inferences about the sampling distribution and other possible samples that we haven&apos;t encountered.
	</p>
	<p>
		The law of large numbers is the trend data to get more accurate with more samples.
		Simply put, statistics gathered from small samples tend to be far off from parameters of the population.
		The law of large numbers seems to simply tell us that the larger our sample size, the more accurate our statistics (Yakir, 2011).
		This should be pretty intuitive to most people.
		We can see this even more when we go to extreme ends of the spectrum.
		With a sample size of zero, we have no data, but let&apos;s try a sample size of one.
		Do you expect to get data that applies to the whole population?
		No, not really.
		But now let&apos;s try a sample size equal to the population size.
		Now we have perfect data, and all estimates made by it are perfectly correct.
		Even with a sample size of n-1 or something, to mirror the sample size of one, we get excellent data.
		The only problem is that the bigger the sample size, the less feasible it is to acquire the sample.
	</p>
	<h3>Sample distribution versus sampling distribution</h3>
	<p>
		The sample distribution, or distribution of a sample, is the distribution that describes the data within the sample.
		The sampling distribution, on the other hand, is a distribution that describes all possible samples of a given size for a given population (Yakir, 2011).
	</p>
	<div class="APA_references">
		<h3>References:</h3>
		<p>
			Yakir, B. (2011, March). Introduction to Statistical Thinking (With R, Without Calculus). Retrieved from <a href="https://my.uopeople.edu/mod/resource/view.php?id=155119"><code>https://my.uopeople.edu/mod/resource/view.php?id=155119</code></a>
		</p>
	</div>
</section>
<section id="Unit8">
	<h2>Unit 8</h2>
	<h3>Important lesson</h3>
	<p>
		I learned a lot this term, both in and out of the classroom.
		Offhand, I think the most useful things I learned were the law of large numbers and the central limit theorem, but that might just be because that&apos;s what&apos;s on my mind due to the discussion assignment this week.
		I&apos;ve questioned how accurate statistics drawn from samples can actually be, and have written off many difficult-to-believe statistics on the grounds that samples don&apos;t capture the data of the population well.
		It turns out with large enough samples though, the data of the population actually <strong>*can*</strong> be pretty accurately estimated, assuming the sample is taken in an unbiased way.
		Statistical accuracy isn&apos;t even improved much by increasing sample size, after a point.
		You need a large enough sample for good date, but the needed sample size is vastly smaller than I would have thought.
	</p>
	<h3>Advice for future students</h3>
	<p>
		I guess if I were to give a new student advice, it&apos;d be not to bite off more than you can chew.
		Little by little, I&apos;ve been working on improving my life.
		I&apos;ve gotten to a place in life where I can actually set and work toward goals of decent importance to me.
		Over this last break between terms, I was making great progress toward a set of several goals, and thought I could continue at about the same pace while school was in session.
		Not so much though.
		In the first week, I was overloaded, and I fell behind.
		I dropped back my effort toward life goals rather quickly to focus on school, but it was too late.
		I was already behind, and the term had just started.
		For the rest of the term, I&apos;ve been struggling to try to catch back up.
		And apparently, I never did catch up fully.
		I failed the final exam.
	</p>
	<p>
		Another important point for new students taking this course would be to make sure and get your initial discussion posts for each week in fairly early.
		Personally, I didn&apos;t have trouble with this, and I&apos;m pretty sure I got my initial post in during the first two days of each week.
		However, many students don&apos;t do this, and wait until the last day or so.
		Because grading happens during the same week as submission that week, your work won&apos;t be graded if you don&apos;t submit early.
		What even happens if no student grades your work?
		I&apos;m not even sure.
		I&apos;m guessing you get a zero though.
	</p>
	<p>
		Also, get help once sampling distributions are brought up.
		I&apos;m certain we never covered how to convert a population distribution into a sampling distribution.
		Yet, in <a href="#Unit8">Unit 8</a>, we took not one, but <strong>*two*</strong> tests that required us to have that knowledge.
		The textbook covers how to take samples and use them to <strong>*estimate*</strong> the sampling distribution, but with that method, you&apos;ll get slightly-different answers each time.
		And these aren&apos;t multiple-choice questions.
		These are field-entry questions.
		If your answers aren&apos;t exact, you fail.
		I failed both of those tests.
		So when sampling distributions are discussed, ask for help with how to calculate them, because you won&apos;t find that information in the reading material.
	</p>
	<p>
		For the final exam, you&apos;re going to want to make sure ahead of time that your proctor of choice will let you either use your own computer or install R on theirs.
		When I first needed a proctor for a course at this university, I couldn&apos;t find a single one in my city, and I could only find one in the next city over.
		So I&apos;ve been biking to the next city over every time a proctored exam rolls around.
		Over there, there&apos;s a testing centre where they have you use their computers instead of your own.
		I didn&apos;t know before choosing them as my proctor again this term that we&apos;d need special software on the final exam.
		I didn&apos;t know we&apos;d be using special software for the course at all.
		But if this advice somehow makes it to students not yet taking the course, I recommend they check with their proctor and make sure they&apos;ll have access to R on the date of the exam.
		Having it really helps for the relevant questions.
		The testing centre I use was cool with me installing R, so it worked out for me, but take precautions if you can.
	</p>
	<p>
		Also in regards to the final, don&apos;t panic.
		I panicked.
		The Unit 8 tests had me thinking that the final would mostly be on sampling distributions.
		I was sure I was going to fail, as like I said, I&apos;m certain we didn&apos;t cover the necessary information about them.
		There was only one question about sampling distributions though.
		Get the rest of the problems right, and you&apos;ll be fine.
		That said, I think I missed a couple due to things I know we covered but I forgot the details of, but still, there&apos;s only one guaranteed failure question.
		And if you get help with this early on as I recommended, even that one won&apos;t be a guaranteed failure question.
		I couldn&apos;t get to sleep the night before due to worrying about the test, and had to get up later than planned to make sure I had enough rest for the test.
		Having gotten up late and needing to bike to the next town over, there was no time for breakfast, which couldn&apos;t&apos;ve helped with the test.
		If you&apos;ve been keeping up with the course though, there&apos;s no need to panic.
		Everything will be fine.
	</p>
</section>
END
);
